Expression monitoring for human cytomegalovirus (HCMV) infection

ABSTRACT

Certain human genes have been found to be induced or repressed in host cells infected with HCMV. A large set of such genes has been identified. These have diagnostic use in determining the extent of tissue damage caused by the infection as well as in determining the stage of disease progression of the HCMV infection. Such genes are likely those involved in mediating the pathology of the infected tissues. Thus by identifying agents which are able to reverse the induction or repression of such genes, one can find candidate therapeutic agents for use in treating and or preventing HCMV-caused disease pathologies.

BACKGROUND OF THE INVENTION

Many biological functions are accomplished by altering the expression ofvarious genes through transcriptional (e.g. through control ofinitiation, provision of RNA precursors, RNA processing, etc.) and/ortranslational control. For example, fundamental biological processessuch as cell cycle, cell differentiation and cell death, are oftencharacterized by the variations in the expression levels of groups ofgenes.

Gene expression is also associated with pathogenesis. For example, thelack of sufficient expression of functional tumor suppressor genesand/or the over expression of oncogene/protooncogenes could lead totumorigenesis (Marshall, Cell, 64:313-326 (1991); Weinberg, Science,254: 1138-1146 (1991), incorporated herein by reference for allpurposes). Thus, changes in the expression levels of particular genes(e.g. oncogenes or tumor suppressors) serve as signposts for thepresence and progression of various diseases.

The study of gene expression in the art has been generally concentratedon the regulatory regions of the gene of interest and on therelationships among a few genes. A number of transcriptional factors/DNAbinding proteins have been identified and a limited number of regulatorypathways have been discovered. However, the expression of a particulargene is frequently regulated by the expression of a large number ofother genes. The expression of those regulatory genes may also be underthe control of additional genes. This complex regulatory relationshipamong genes constitutes a genetic network. The function and regulationof a particular gene can be best understood in the context of thisgenetic network. As the Human Genome Project and commercial genomeresearch progress at a great rate, most, if not all, of the expressedgenes will be partially sequenced in the near future. Understanding thefunctions and regulatory relationships among the large number of genesis becoming a difficult task with traditional tools. Therefore, there isa need in the art to develop a systematic approach to understand thecomplex regulatory relationships among large numbers of genes.

SUMMARY OF THE INVENTION

This invention provides methods, compositions, and apparatus forstudying the complex regulatory relationships among host genes andviruses, in particular HCMV. In some of its specific applications, thisinvention provides methods, compositions, and apparatus for identifyingdrugs for preventing or ameliorating disease symptoms caused by HCMV. Inother applications the invention provides methods for determining thestage of infection or the extent of tissue damage caused by HCMVinfection. In another embodiment the invention provides a general methodfor narrowing large sets of genes which may be important down to smallersubsets of genes which have elevated probabilities of beingbiologically, physiologically, and medically relevant.

BRIEF DESCRIPTION OF THE DRAWINGS AND TABLE

FIG. 1. Characterization of RNA target samples and reproducibility ofarray-based hybridization results. (FIG. 1A) 82, 68 and 51 probe pairswere used to interrogate the 5′, middle and 3′ portions of the GAPDHmRNA, which is constitutively expressed in fibroblasts. (FIG. 1B, FIG. 1C) The plots compare the average difference intensities (Avg. Diff.Intensities) of the 20 probe pairs interrogating each of the genespresent in two independent experiments performed on the mock-infectedcells (FIG. 1.A) or cells at 8 h after infection (FIG. 1. B). Theparallel lines flanking the center diagonal line indicate 3, 10 and30-fold changes in intensity. With the exception of the thombospondin-1gene in the mock-infected control, all other genes demonstrated anaverage difference in their hybridization intensities of less than3-fold.

FIG. 2A, FIG. 2B, FIG. 2C. Global survey of the differences in mRNAlevels after HCMV infection.

The plots show the variation in expression levels (Avg. Diff.Intensities) between mock-infected cells and cells at 40 min, 8 h and 24h after infection (FIG. 2A, FIG. 2B, FIG. 2C, respectively). Changes inexpression of 3, 10, and 30-fold are highlighted by the parallel linesflanking the center diagonal line.

FIG. 3. Representative Northern blot analyses confirming changes in mRNAlevels predicted by DNA array assay. Cultures of primary human diploidfibroblasts were infected with HCMV strain AD169 or Toledo, and totalcellular RNA was analyzed by Northern blot at 40 min, 8 h and 24 h afterinfection. Genes to which the probes correspond are identified to theright of the autoradiograms. M, mock-infected cells.

FIGS. 4A and 4B (Table 1). Cellular mRNAs whose levels change by afactor of four or more after infection with HCMV. Identity of columnsfrom left to right: GenBank accession number; name of gene encodingmRNA; time(s) after infection when a change in mRNA level was observedplus fold change; increase (U) or decrease (D) in steady state level ofRNA; gene chip results confirmed in this report by northern blot (1),confirmed by another literature report (2), not confirmed (3).

DETAILED DESCRIPTION OF THE INVENTION

1. Definitions

Bind(s) substantially: “Bind(s) substantially” refers to complementaryhybridization between a probe nucleic acid and a target nucleic acid andembraces minor mismatches that can be accommodated by reducing thestringency of the hybridization media to achieve the desired detectionof the target polynucleotide sequence.

Background: The terms “background” or “background signal intensity”refer to hybridization signals resulting from non-specific binding, orother interactions, between the labeled target nucleic acids andcomponents of the oligonucleotide array (e.g., the oligonucleotideprobes, control probes, the array substrate, etc.). Background signalsmay also be produced by intrinsic fluorescence of the array componentsthemselves. A single background signal can be calculated for the entirearray, or a different background signal may be calculated for eachtarget nucleic acid. In a preferred embodiment, background is calculatedas the average hybridization signal intensity for the lowest 5% to 10%of the probes in the array, or, where a different background signal iscalculated for each target gene, for the lowest 5% to 10% of the probesfor each gene. Of course, one of skill in the art will appreciate thatwhere the probes to a particular gene hybridize well and thus appear tobe specifically binding to a target sequence, they should not be used ina background signal calculation. Alternatively, background may becalculated as the average hybridization signal intensity produced byhybridization to probes that are not complementary to any sequence foundin the sample (e.g. probes directed to nucleic acids of the oppositesense or to genes not found in the sample such as bacterial genes wherethe sample is mammalian nucleic acids). Background can also becalculated as the average signal intensity produced by regions of thearray that lack any probes at all.

Cis-acting: The term “cis-acting” is used here to refer to theregulation of gene expression by a DNA subsequence in the same DNAmolecule as the target gene. Cis-acting can be exerted either by thebinding of trans-acting transcriptional factors or by long rangecontrol.

Complexity: The term “complexity” is used here according to standardmeaning of this term as established by Britten et al. Methods ofEnzymol. 29:363 (1974). See, also Cantor and Schimmel BiophysicalChemistry: Part III at 1228-1230 for further explanation of nucleic acidcomplexity.

Hybridizing specifically to: The phrase “hybridizing specifically to”refers to the binding, duplexing, or hybridizing of a moleculesubstantially to or only to a particular nucleotide sequence orsequences under stringent conditions when that sequence is present in acomplex mixture (e.g., total cellular) DNA or RNA.

Introns: noncoding DNA sequences which separate neighboring codingregions. During gene transcription, introns, like exons, are transcribedinto RNA but are subsequently removed by RNA splicing.

Massive Parallel Screening: The phrase “massively parallel screening”refers to the simultaneous screening of at least about 100, preferablyabout 1000, more preferably about 10,000 and most preferably about1,000,000 different nucleic acid hybridizations.

Mismatch control: The term “mismatch control” or “mismatch probe” referto a probe whose sequence is deliberately selected not to be perfectlycomplementary to a particular target sequence. For each mismatch (MM)control in a high-density array there typically exists a correspondingperfect match (PM) probe that is perfectly complementary to the sameparticular target sequence. The mismatch may comprise one or more bases.While the mismatch(s) may be located anywhere in the mismatch probe,terminal mismatches are less desirable as a terminal mismatch is lesslikely to prevent hybridization of the target sequence. In aparticularly preferred embodiment, the mismatch is located at or nearthe center of the probe such that the mismatch is most likely todestabilize the duplex with the target sequence under the testhybridization conditions.

mRNA or transcript: The term “mRNA” refers to transcripts of a gene.Transcripts are RNA including, for example, mature messenger RNA readyfor translation, products of various stages of transcript processing.Transcript processing may include splicing, editing and degradation.

Nucleic Acid: The terms “nucleic acid” or “nucleic acid molecule” referto a deoxyribonucleotide or ribonucleotide polymer in either single ordouble-stranded form, and unless otherwise limited, would encompassanalogs of natural nucleotide that can function in a similar manner asnaturally occurring nucleotide. An oligo-nucleotide is a single-strandednucleic acid of 2 to n bases, where n may be greater than 500 to 1000.Nucleic acids may be cloned or synthesized using any technique known inthe art. They may also include non-naturally occurring nucleotideanalogs, such as those which are modified to improve hybridization andpeptide nucleic acids.

Nucleic acid encoding a regulatory molecule: The regulatory molecule maybe DNA, RNA or protein Thus for example DNA sites which bind protein orother nucleic acid molecules are included within the class of regulatorymolecules encoded by a nucleic acid.

Perfect match probe: The term “perfect match probe” refers to a probethat has a sequence that is perfectly complementary to a particulartarget sequence. The test probe is typically perfectly complementary toa portion (subsequence) of the target sequence. The perfect match (PM)probe can be a “test probe”, a “normalization control” probe, anexpression level control probe and the like. A perfect match control orperfect match probe is, however, distinguished from a “mismatch control”or “mismatch probe.”

Probe: As used herein a “probe” is defined as a nucleic acid, capable ofbinding to a target nucleic acid of complementary sequence through oneor more types of chemical bonds, usually through complementary basepairing, usually through hydrogen bond formation. As used herein, aprobe may include natural (i.e. A, G, U, C, or T) or modified bases(7-deazaguanosine, inosine, etc.). In addition, the bases in probes maybe joined by a linkage other than a phosphodiester bond, so long as itdoes not interfere with hybridization. Thus, probes may be peptidenucleic acids in which the constituent bases are joined by peptide bondsrather than phosphodiester linkages.

Target nucleic acid: The term “target nucleic acid” refers to a nucleicacid (often derived from a biological sample), to which the probe isdesigned to specifically hybridize. It is either the presence or absenceof the target nucleic acid that is to be detected, or the amount of thetarget nucleic acid that is to be quantified. The target nucleic acidhas a sequence that is complementary to the nucleic acid sequence of thecorresponding probe directed to the target. The term target nucleic acidmay refer to the specific subsequence of a larger nucleic acid to whichthe probe is directed or to the overall sequence (e.g., gene or mRNA)whose expression level it is desired to detect. The difference in usagewill be apparent from context.

Trans-acting: The term “trans-acting” refers to regulation of geneexpression by a product that is encoded by a gene at a remote location,usually as a result of binding to a cis-element.

Stringent conditions: The term “stringent conditions” refers toconditions under which a probe will hybridize to its target subsequence,but with only insubstantial hybridization to other sequences or to othersequences such that the difference may be identified. Stringentconditions are sequence-dependent and will be different in differentcircumstances. Longer sequences hybridize specifically at highertemperatures. Generally, stringent conditions are selected to be about5° C. lower than the thermal melting point (Tm) for the specificsequence at a defined ionic strength and pH.

Subsequence: “Subsequence” refers to a sequence of nucleic acids thatcomprise a part of a longer sequence of nucleic acids.

Thermal melting point (Tm): The Tm is the temperature, under definedionic strength, pH, and nucleic acid concentration, at which 50% of theprobes complementary to the target sequence hybridize to the targetsequence at equilibrium. As the target sequences are generally presentin excess, at Tm, 50% of the probes are occupied at equilibrium).Typically, stringent conditions will be those in which the saltconcentration is at least about 0.01 to 1.0 M Na ion concentration (orother salts) at pH 7.0 to 8.3 and the temperature is at least about 30 Cfor short probes (e.g., 10 to 50 nucleotide). Stringent conditions mayalso be achieved with the addition of destabilizing agents such asformamide.

Quantifying: The term “quantifying” when used in the context ofquantifying transcription levels of a gene can refer to absolute or torelative quantification. Absolute quantification may be accomplished byinclusion of known concentration(s) of one or more target nucleic acids(e.g. control nucleic acids such as Bio B or with known amounts thetarget nucleic acids themselves) and referencing the hybridizationintensity of unknowns with the known target nucleic acids (e.g. throughgeneration of a standard curve). Alternatively, relative quantificationcan be accomplished by comparison of hybridization signals between twoor more genes, or between two or more treatments to quantify the changesin hybridization intensity and, by implication, transcription level.

Sequence identity: The “percentage of sequence identity” or “sequenceidentity” is determined by comparing two optimally aligned sequences orsubsequences over a comparison window or span, wherein the portion ofthe polynucleotide sequence in the comparison window may optionallycomprise additions or deletions (i.e., gaps) as compared to thereference sequence (which does not comprise additions or deletions) foroptimal alignment of the two sequences. The percentage is calculated bydetermining the number of positions at which the identical subunit (e.g.nucleic acid base or amino acid residue) occurs in both sequences toyield the number of matched positions, dividing the number of matchedpositions by the total number of positions in the window of comparisonand multiplying the result by 100 to yield the percentage of sequenceidentity. Percentage sequence identity when calculated using theprograms GAP or BESTFIT (see below) is calculated using default gapweights.

Methods of alignment of sequences for comparison are well known in theart. Optimal alignment of sequences for comparison may be conducted bythe local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman and WunschJ. Mol. Biol. 48: 443 (1970), by the search for similarity method ofPearson and Lipman, Proc. Natl. Acad. Sci. USA 85: 2444 (1988), bycomputerized implementations of these algorithms (including, but notlimited to CLUSTAL in the PC/Gene program by Intelligenetics, MoutainView, Calif., GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin GeneticsSoftware Package, Genetics Computer Group (GCG), 575 Science Dr.,Madison, Wis., USA), or by inspection. In particular, methods foraligning sequences using the CLUSTAL program are well described byHiggins and Sharp in Gene, 73: 237-244 (1988) and in CABIOS 5: 151-153(1989)).

Up-stream or down-stream gene. If the expression of a first gene isregulated by a second gene, the second gene is called an “up-streamgene” for the first gene and the first gene is the “down-stream” gene ofthe second gene. The regulation of the first gene by second gene couldbe through trans-activation. For example, the first gene encodes atranscriptional factor that controls the expression of the second gene.The regulation can also be exerted by cis-acting. For example, the firstgene is in the proximity of the second gene and exerts a positionaleffect on the expression of the second gene. In this case, the firstgene does not have to be expressed in order to have an influence on thesecond gene.

According to the present invention, the stage of disease caused by HCMVinfection can be determined. Expression levels of one or more geneswhich are induced or repressed by HCMV are determined in a first humancell sample. The first human cell sample comprises cells of a patientinfected with HCMV and consists essentially of HCMV-infected cells. Theexpression levels of the one or more genes correlates with stage ofdisease progression of the HCMV infection.

According to another embodiment of the invention the extent of tissuedamage caused by HCMV infection is determined. Expression levels of oneor more genes which are induced or repressed by HCMV are determined in afirst human cell sample. The first human cell sample comprises cells ofa patient infected with HCMV and consists essentially of HCMV-infectedcells. The expression levels of the one or more genes correlates withextent of tissue damage caused by the HCMV infection.

According to yet another aspect of the invention candidate drugs forpreventing or ameliorating disease symptoms caused by HCMV areidentified. Human cells are contacted with HCMV and a test agent. Thecontacting of the test agent and the HCMV with the cells can be at thesame time or sequentially. Expression levels of one or more genes whichare induced or repressed by HCMV are determined. Test agents areidentified as candidate drugs if the test agents cause the human cellsto express the one or more genes at the same (i.e., within 10-50% ) ofthe level at which the human cells express the one or more genes in theabsence of HCMV infection.

The genes whose expression levels are tested are those which are inducedor repressed by HCMV. These are preferably those which are induced orrepressed to a level which is at least two-fold, four-fold, eight-fold,ten-fold, fifteen-fold different than the level of expression in theabsence of HCMV. More preferably the genes are selected from those genesidentified in Table 1 as repressed or induced by HCMV. Of those geneswhich are identified in Table 1, HLA-E, Ro/SSA, lipocortin-1, cPLA2,COX-2, thrombospondin-1, and MITF, are preferred.

According to another aspect of the invention a general paradigm foridentifying subsets of biologically, physiologically, or medicallyrelevant genes is provided. This paradigm permits the prioritization ofattention, investigation, and research on those genes which are mostlikely to have biologically, physiologically, or medically relevance.The paradigm involves the combination of expression data with othertypes of information contained within databases, including the generalscientific literature, the patent literature, nucleotide and amino acidsequence databases, etc. By systematically searching databases forinformation which may shed light on the source of the expression data,one can make new connections and correlations which will confirm orsuggest heightened biological, physiological, or medical relevance. Thediscussion section of the Example provides multiple examples of theoperation of this paradigm. Generally, expression levels of at least twogenes are compared between two cell samples. The two cell samples arethe same but for a selected environmental, genetic, or developmentaldifference. These include without limitation contact with a drug orother exogenous chemical agent; temperature difference; mutation; viralinfection; developmental stage difference; and bacterial infection. Aset of genes whose expression levels differ between the two cell samplesis identified. A database is searched to identify an environmentalagent, gene, disease, biological phenomenon, or developmental stagepreviously associated with expression or loss of expression ofindividual members of the set of genes. As exemplified below, these mayinclude without limitation an immunological reaction, a biochemicalpathway, knock-out experimental animals, mutant animals or cells,diseases. When a common biological feature is identified between theselected environmental, genetic, or developmental difference usedbetween the two sets of cells and the unselected environmental agent,gene, disease, or developmental stage identified from the database, thecommon gene is identified as being a member of a subset of genes whichare improved targets for drug development. A common biological featuremay be, for example, an association with a symptom of a disease, aphenotype, a reaction, etc. A computer-readable medium havingcomputer-executable instructions may also be used for performing thesteps of this paradigm. Such a medium may be stored on disk or othersuitable medium, as is well known in the art.

The method by which expression levels are determined is not critical tothe invention. Either mRNA or protein expression from one or more genesmay be determined. Any method known the art for determining suchexpression levels can be used. These include without limitationhybridization to an array of oligonucleotides, serial analysis of geneexpression, hybridization on a solid support, and immunological assays,such as Western blot, ELISA, immunohistochemistry. mRNA can be collectedfrom the human cells, reverse transcribed, and used as a template foramplification. Any detection means known in the art can be used,including but not limited to use of a radioactive label, a fluorescentlabel, a chromophoric label, an enzymatic label.

Various techniques known in the art render the screening of largenumbers of genes' expression relatively straight-forward. Thus althougha single gene's expression can be determined and can provide diagnosticand prognostic information, multiple genes can also be tested for theirexpression levels. In some embodiments at least 2 , 5 , 10, 15, 25, 50,100, 150, 200, 350, 500, or 1000 genes are tested to determine theirexpression levels.

Cell samples for use in the assays of the present invention can be anycell type which is infected by HCMV. These include but are not limitedto human cells fibroblasts, lymphocytes, epithelial cells, lungepithelial cells, and neuronal cells. When comparisons are done betweenan HCMV-infected cell sample and an uninfected cell sample, preferablythe two samples are of the cell type. However this is not alwayspossible. Comparisons can also be done with consensus expressionprofiles determined using a population of cells from normal, uninfectedcells.

Once expression levels of one or more genes are determined in a clinicalhuman cell sample from HCMV-infected cells, they can be compared to thelevel found in uninfected controls, whether from the same individual andpreferably the same type of cell, or from other individuals. A level ofinduction or repression greater than the predetermined threshold can becorrelated with a stage of disease progression or extent of tissuedamage. Such correlation can be made by reference to a standard curvegenerated of stage of disease progression or extent of tissue damage asa function of gene expression levels.

The methods of the invention involve quantifying the level of expressionof a large number of genes. In some preferred embodiments, a highdensity oligonucleotide array is used to hybridize with a target nucleicacid sample to detect the expression level of a large number of genes,preferably more than 10, more preferably more than 100, and mostpreferably more than 1000 genes.

Activity of a gene is reflected by the activity of its product(s): theproteins or other molecules encoded by the gene. Those product moleculesperform biological functions. Directly measuring the activity of a geneproduct is, however, often difficult for certain genes. Instead, theimmunological activities or the amount of the final product(s) or itspeptide processing intermediates are determined as a measurement of thegene activity. More frequently, the amount or activity of intermediates,such as transcripts, RNA processing intermediates, or mature mRNAs aredetected as a measurement of gene activity.

In many cases, the form and function of the final product(s) of a geneis unknown. In those cases, the activity of a gene is measuredconveniently by the amount or activity of transcript(s), RNA processingintermediate(s), mature mRNA(s) or its protein product(s) or functionalactivity of its protein product(s).

Any methods that measure the activity of a gene are useful for at leastsome embodiments of this invention. For example, traditional Northernblotting and hybridization, nuclease protection, RT-PCR and differentialdisplay have been used for detecting gene activity. Those methods areuseful for some embodiments of the invention. However, this invention ismost useful in conjunction with methods for detecting the expression ofa large number of genes.

High density arrays are particularly useful for monitoring theexpression control at the transcriptional, RNA processing anddegradation level. The fabrication and application of high densityarrays in gene expression monitoring have been disclosed previously in,for example, WO 97/10365, WO 92/10588, U.S. application Ser. No.08/772,376 filed Dec. 23, 1996; Ser. No. 08/529,115 filed on Sep. 15,1995; Ser. No. 08/168,904 filed Dec. 15, 1993; Ser. No. 07/624,114 filedon Dec. 6, 1990, Ser. No. 07/362,901 filed Jun. 7, 1990, allincorporated herein for all purposed by reference. In some embodimentusing high density arrays, high density oligonucleotide arrays aresynthesized using methods such as the Very Large Scale ImmobilizedPolymer Synthesis (VLSIPS) disclosed in U.S. Pat. No. 5,445,934incorporated herein for all purposes by reference. Each oligonucleotideoccupies a known location on a substrate. A nucleic acid target sampleis hybridized with a high density array of oligonucleotides and then theamount of target nucleic acids hybridized to each probe in the array isquantified. One preferred quantifying method is to use confocalmicroscope and fluorescent labels. The GeneChip® system (Affymetrix,Santa Clara, Calif.) is particularly suitable for quantifying thehybridization; however, it will be apparent to those of skill in the artthat any similar systems or other effectively equivalent detectionmethods can also be used.

High density arrays are suitable for quantifying a small variations inexpression levels of a gene in the presence of a large population ofheterogeneous nucleic acids. Such high density arrays can be fabricatedeither by de novo synthesis on a substrate or by spotting ortransporting nucleic acid sequences onto specific locations ofsubstrate. Nucleic acids are purified and/or isolated from biologicalmaterials, such as a bacterial plasmid containing a cloned segment ofsequence of interest. Suitable nucleic acids are also produced byamplification of templates. As a nonlimiting illustration, polymerasechain reaction, and/or in vitro transcription, are suitable nucleic acidamplification methods.

Synthesized oligonucleotide arrays are particularly preferred for thisinvention. Oligonucleolide arrays have numerous advantages, as opposedto other methods, such as efficiency of production, reduced intra- andinter array variability, increased information content and highsignal-to-noise ratio.

Preferred high density arrays for gene function identification andgenetic network mapping comprise greater than about 100, preferablygreater than about 1000, more preferably greater than about 16,000 andmost preferably greater than 65,000 or 250,000 or even greater thanabout 1,000,000 different oligonucleotide probes, preferably in lessthan 1 cm² of surface area. The oligonucleotide probes range from about5 to about 50 or about 500 nucleotides, more preferably from about 10 toabout 40 nucleotide and most preferably from about 15 to about 40nucleotides in length.

Massive Parallel Gene Expression Monitoring

One preferred method for massive parallel gene expression monitoring isbased upon high density nucleic acid arrays. Nucleic acid array methodsfor monitoring gene expression are disclosed and discussed in detail inPCT Application WO 092.10588 (published on Jun. 25, 1992), allincorporated herein by reference for all purposes.

Generally those methods of monitoring gene expression involve (a)providing a pool of target nucleic acids comprising RNA transcript(s) ofone or more target gene(s), or nucleic acids derived from the RNAtranscript(s); (b) hybridizing the nucleic acid sample to a high densityarray of probes and (c) detecting the hybridized nucleic acids andcalculating a relative and/or absolute expression (transcription, RNAprocessing or degradation) level.

(A). Providing a Nucleic Acid Sample

One of skill in the art will appreciate that it is desirable to havenucleic samples containing target nucleic acid sequences that reflectthe transcripts of interest. Therefore, suitable nucleic acid samplesmay contain transcripts of interest. Suitable nucleic acid samples,however, may contain nucleic acids derived from the transcripts ofinterest. As used herein, a nucleic acid derived from a transcriptrefers to a nucleic acid for whose synthesis the mRNA transcript or asubsequence thereof has ultimately served as a template. Thus, a cDNAreverse transcribed from a transcript, an RNA transcribed from thatcDNA, a DNA amplified from the cDNA, an RNA transcribed from theamplified DNA, etc., are all derived from the transcript and detectionof such derived products is indicative of the presence and/or abundanceof the original transcript in a sample. Thus, suitable samples include,but are not limited to, transcripts of the gene or genes, cDNA reversetranscribed from the transcript, cRNA transcribed from the cDNA, DNAamplified from the genes, RNA transcribed from amplified DNA, and thelike.

Transcripts, as used herein, may include, but not limited to pre-mRNAnascent transcript(s), transcript processing intermediates, maturemRNA(s) and degradation products. It is not necessary to monitor alltypes of transcripts to practice this invention. For example, one maychoose to practice the invention to measure the mature mRNA levels only.

In one embodiment, such sample is a homogenate of cells or tissues orother biological samples. Preferably, such sample is a total RNApreparation of a biological sample. More preferably in some embodiments,such a nucleic acid sample is the total mRNA isolated from a biologicalsample. Those of skill in the art will appreciate that the total mRNAprepared with most methods includes not only the mature mRNA, but alsothe RNA processing intermediates and nascent pre-mRNA transcripts. Forexample, total mRNA purified with a poly (dT) column contains RNAmolecules with poly (A) tails. Those polyA⁺ RNA molecules could bemature mRNA, RNA processing intermediates, nascent transcripts ordegradation intermediates.

Biological samples may be of any biological tissue or fluid or cellsfrom any organism. Frequently the sample will be a “clinical sample”which is a sample derived from a patient. Clinical samples provide arich source of information regarding the various states of geneticnetwork or gene expression. Some embodiments of the invention areemployed to detect mutations and to identify the phenotype of mutations.Such embodiments have extensive applications in clinical diagnostics andclinical studies. Typical clinical samples include, but are not limitedto, sputum, blood, blood cells (e.g., white cells), tissue or fineneedle biopsy samples, urine, peritoneal fluid, and pleural fluid, orcells therefrom.

Biological samples may also include sections of tissues, such as frozensections or formalin fixed sections taken for histological purposes.

Another typical source of biological samples are cell cultures wheregene expression states can be manipulated to explore the relationshipamong genes. In one aspect of the invention, methods are provided togenerate biological samples reflecting a wide variety of states of thegenetic network.

One of skill in the art would appreciate that it is desirable to inhibitor destroy RNase present in homogenates before homogenates can be usedfor hybridization. Methods of inhibiting or destroying nucleases arewell known in the art. In some preferred embodiments, cells or tissuesare homogenized in the presence of chaotropic agents to inhibitnuclease. In some other embodiments, RNase is inhibited or destroyed byheat treatment followed by proteinase treatment.

Methods of isolating total mRNA are also well known to those of skill inthe art. For example, methods of isolation and purification of nucleicacids are described in detail in Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology: Hybridization With Nucleic AcidProbes, Part I Theory and Nucleic Acid Preparation, P. Tijssen, ed.Elsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques inBiochemistry and Molecular Biology: Hybridization With Nucleic AcidProbes, Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed.Elsevier, N.Y. (1993)).

In a preferred embodiment, the total RNA is isolated from a given sampleusing, for example, an acid guanidinium-phenol-chloroform extractionmethod and polyA⁺ mRNA is isolated by oligo(dT) column chromatography orby using (dT) on magnetic beads (see, e.g., Sambrook et al., MolecularCloning: A Laboratory Manual (2nd ed.), Vols. 1-3, Cold Spring HarborLaboratory, (1989), or Current Protocols in Molecular Biology, F.Ausubel el al., ed. Greene Publishing and Wiley-Interscience, New York(1987)).

Frequently, it is desirable to amplify the nucleic acid sample prior tohybridization. One of skill in the art will appreciate that whateveramplification method is used, if a quantitative result is desired, caremust be taken to use a method that maintains or controls for therelative frequencies of the amplified nucleic acids to achievequantitative amplification.

Methods of “quantitative” amplification are well known to those of skillin the art. For example, quantitative PCR involves simultaneouslyco-amplifying a known quantity of a control sequence using the sameprimers. This provides an internal standard that may be used tocalibrate the PCR reaction. The high density array may then includeprobes specific to the internal standard for quantification of theamplified nucleic acid.

One preferred internal standard is a synthetic AW106 cRNA. The AW106cRNA is combined with RNA isolated from the sample according to standardtechniques known to those of skilled in the art. The RNA is then reversetranscribed using a reverse transcriptase to provide copy DNA. The cDNAsequences are then amplified (e.g., by PCR) using labeled primers. Theamplification products are separated, typically by electrophoresis, andthe amount of radioactivity (proportional to the amount of amplifiedproduct) is determined. The amount of mRNA in the sample is thencalculated by comparison with the signal produced by the known AW106 RNAstandard. Detailed protocols for quantitative PCR are provided in PCRProtocols, A Guide to Methods and Applications, Innis et al., AcademicPress, Inc. N.Y., (1990).

Other suitable amplification methods include, but are not limited topolymerase chain reaction (PCR) (Innis, et al., PCR Protocols. A guideto Methods and Application. Academic Press, Inc. San Diego, (1990)),ligase chain reaction (LCR) (see Wu and Wallace, Genomics, 4: 560(1989), Landegren, et al., Science, 241: 1077 (1988) and Barringer, etal., Gene, 89: 117 (1990), transcription amplification (Kwoh, et al.,Proc. Natl. Acad. Sci. USA, 86: 1173 (1989)), and self-sustainedsequence replication (Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87:1874 (1990)).

Cell lysates or tissue homogenates often contain a number of inhibitorsof polymerase activity. Therefore, RT-PCR typically incorporatespreliminary steps to isolate total RNA or mRNA for subsequent use as anamplification template. A one-tube mRNA capture method may be used toprepare poly(A)⁺RNA samples suitable for immediate RT-PCR in the sametube (Boehringer Mannheim). The captured mRNA can be directly subjectedto RT-PCR by adding a reverse transcription mix and, subsequently, a PCRmix.

In a particularly preferred embodiment, the sample mRNA is reversetranscribed with a reverse transcriptase and a primer consisting ofoligo(dT) and a sequence encoding the phage T7 promoter to providesingle stranded DNA template. The second DNA strand is polymerized usinga DNA polymerase. After synthesis of double-stranded cDNA, T7 RNApolymerase is added and RNA is transcribed from the cDNA template.Successive rounds of transcription from each single cDNA templateresults in amplified RNA. Methods of in vitro polymerization are wellknown to those of skill in the art (see, e.g., Sambrook, supra.) andthis particular method is described in detail by Van Gelder, et al.,Proc. Natl. Acad. Sci. USA, 87: 1663-1667 (1990) who demonstrate that invitro amplification according to this method preserves the relativefrequencies of the various RNA transcripts. Moreover, Eberwine et alProc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol that usestwo rounds of amplification via in vitro transcription to achievegreater than 10⁶ fold amplification of the original starting material,thereby permitting expression monitoring even where biological samplesare limited.

It will be appreciated by one of skill in the art that the directtranscription method described above provides an antisense (aRNA) pool.Where antisense RNA is used as the target nucleic acid, theoligonucleotide probes provided in the array are chosen to becomplementary to subsequences of the antisense nucleic acids.Conversely, where the target nucleic acid pool is a pool of sensenucleic acids, the oligonucleotide probes are selected to becomplementary to subsequences of the sense nucleic acids. Finally, wherethe nucleic acid pool is double stranded, the probes may be of eithersense as the target nucleic acids include both sense and antisensestrands.

The protocols cited above include methods of generating pools of eithersense or antisense nucleic acids. Indeed, one approach can be used togenerate either sense or antisense nucleic acids as desired. Forexample, the cDNA can be directionally cloned into a vector (e.g.,Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked bythe T3 and T7 promoters. In vitro transcription with the T3 polymerasewill produce RNA of one sense (the sense depending on the orientation ofthe insert), while in vitro transcription with the T7 polymerase willproduce RNA having the opposite sense. Other suitable cloning systemsinclude phage lambda vectors designed for Cre-loxP plasmid subcloning(see e.g., Palazzolo et al., Gene, 88: 25-36 (1990)).

(B) Hybridizing Nucleic Acids to High Density Arrays

1. Probe Design

One of skill in the art will appreciate that an enormous number of arraydesigns are suitable for the practice of this invention. The highdensity array will typically include a number of probes thatspecifically hybridize to the sequences of interest. In addition, in apreferred embodiment, the array will include one or more control probes.

The high density array chip includes “test probes.” Test probes could beoligonucleotides that range from about 5 to about 45 or 5 to about 500nucleotides, more preferably from about 10 to about 40 nucleotides andmost preferably from about 15 to about 40 nucleotides in length. Inother particularly preferred embodiments the probes are 20 or 25nucleotides in length. In another preferred embodiments, test probes aredouble or single strand DNA sequences. DNA sequences are isolated orcloned from nature sources or amplified from nature sources using naturenucleic acid as templates. These probes have sequences complementary toparticular subsequences of the genes whose expression they are designedto detect. Thus, the test probes are capable of specifically hybridizingto the target nucleic acid they are to detect.

In addition to test probes that bind the target nucleic acid(s) ofinterest, the high density array can contain a number of control probes.The control probes fall into three categories referred to herein as 1)normalization controls; 2) expression level controls; and 3) mismatchcontrols.

Normalization controls are oligonucleotide or other nucleic acid probesthat are complementary to labeled reference oligonucleotides or othernucleic acid sequences that are added to the nucleic acid sample. Thesignals obtained from the normalization controls after hybridizationprovide a control for variations in hybridization conditions, labelintensity, “reading” efficiency and other factors that may cause thesignal of a perfect hybridization to vary between arrays. In a preferredembodiment, signals (e.g., fluorescence intensity) read from all otherprobes in the array are divided by the signal (e.g., fluorescenceintensity) from the control probes thereby normalizing the measurements.

Virtually any probe may serve as a normalization control. However, it isrecognized that hybridization efficiency varies with base compositionand probe length. Preferred normalization probes are selected to reflectthe average length of the other probes present in the array, however,they can be selected to cover a range of lengths. The normalizationcontrol(s) can also be selected to reflect the (average) basecomposition of the other probes in the array, however in a preferredembodiment, only one or a few normalization probes are used and they areselected such that they hybridize well (i.e. no secondary structure) anddo not match any target-specific probes.

Expression level controls are probes that hybridize specifically withconstitutively expressed genes in the biological sample. Virtually anyconstitutively expressed gene provides a suitable target for expressionlevel controls. Typically expression level control probes have sequencescomplementary to subsequences of constitutively expressed “housekeepinggenes” including, but not limited to the β-actin gene, the transferrinreceptor gene, the GAPDH gene, and the like.

Mismatch controls may also be provided for the probes to the targetgenes, for expression level controls or for normalization controls.Mismatch controls are oligonucleotide probes or other nucleic acidprobes identical to their corresponding test or control probesexcept-for the presence of one or more mismatched bases. A mismatchedbase is a base selected so that it is not complementary to thecorresponding base in the target sequence to which the probe wouldotherwise specifically hybridize. One or more mismatches are selectedsuch that under appropriate hybridization conditions (e.g. stringentconditions) the test or control probe would be expected to hybridizewith its target sequence, but the mismatch probe would not hybridize (orwould hybridize to a significantly lesser extent). Preferred mismatchprobes contain a central mismatch. Thus, for example, where a probe is a20 mer, a corresponding mismatch probe will have the identical sequenceexcept for a single base mismatch (e.g., substituting a G, a C or a Tfor an A) at any of positions 6 through 14 (the central mismatch).

Mismatch probes thus provide a control for non-specific binding orcross-hybridization to a nucleic acid in the sample other than thetarget to which the probe is directed. Mismatch probes thus indicatewhether a hybridization is specific or not. For example, if the targetis present the perfect match probes should be consistently brighter thanthe mismatch probes. In addition, if all central mismatches are present,the mismatch probes can be used to detect a mutation. The difference inintensity between the perfect match and the mismatch probe (I(PM)-I(MM))provides a good measure of the concentration of the hybridized material.

The high density array may also include sample preparation/amplificationcontrol probes. These are probes that are complementary to subsequencesof control genes selected because they do not normally occur in thenucleic acids of the particular biological sample being assayed.Suitable sample preparation/amplification control probes include, forexample, probes to bacterial genes (e.g., Bio B) where the sample inquestion is a biological from a eukaryote.

The RNA sample is then spiked with a known amount of the nucleic acid towhich the sample preparation/amplification control probe is directedbefore processing. Quantification of the hybridization of the samplepreparation/amplification control probe then provides a measure ofalteration in the abundance of the nucleic acids caused by processingsteps (e.g. PCR, reverse transcription, in vitro transcription, etc.).

In a preferred embodiment, oligonucleotide probes in the high densityarray are selected to bind specifically to the nucleic acid target towhich they are directed with minimal non-specific binding orcross-hybridization under the particular hybridization conditionsutilized. Because the high density arrays of this invention cancontaining in excess of 1,000,000 different probes, it is possible toprovide every probe of a characteristic length that binds to aparticular nucleic acid sequence. Thus, for example, the high densityarray can contain every possible 20-mer sequence complementary to anIL-2 mRNA.

However, there may exist 20-mer subsequences that are not unique to theIL-2 mRNA. Probes directed to these subsequences are expected tocross-hybridize with occurrences of their complementary sequence inother regions of the sample genome. Similarly, other probes simply maynot hybridize effectively under the hybridization conditions (e.g., dueto secondary structure, or interactions with the substrate or otherprobes). Thus, in a preferred embodiment, the probes that show such poorspecificity or hybridization efficiency are identified and may not beincluded either in the high density array itself (e.g., duringfabrication of the array) or in the post-hybridization data analysis.

In addition, in a preferred embodiment, expression monitoring arrays areused to identify the presence and expression (transcription) level ofgenes which are several hundred base pairs long. For most applicationsit would be useful to identify the presence, absence, or expressionlevel of several thousand to one hundred thousand genes. Because thenumber of oligonucleotides per array is limited in a preferredembodiment, it is desired to include only a limited set of probesspecific to each gene whose expression is to be detected.

As disclosed in U.S. application Ser. No. 08/772,376, probes as short as15, 20, or 25 nucleotide are sufficient to hybridize to a subsequence ofa gene and that, for most genes, there is a set of probes that performswell across a wide range of target nucleic acid concentrations. In apreferred embodiment, it is desirable to choose a preferred or “optimum”subset of probes for each gene before synthesizing the high densityarray.

2. Forming High Density Arrays.

Methods of forming high density arrays of oligonucleotides, peptides andother polymer sequences with a minimal number of synthetic steps areknown. The oligonucleotide analogue array can be synthesized on a solidsubstrate by a variety of methods, including, but not limited to,light-directed chemical coupling, and mechanically directed coupling.See Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCT ApplicationNo. WO 90/15070) and Fodor et al., PCT Publication Nos. WO 92/10092 andWO 93/09668 and U.S. Ser. No. 07/980,523 which disclose methods offorming vast arrays of peptides, oligonucleotides and other moleculesusing, for example, light-directed synthesis techniques. See also, Fodoret al., Science, 251, 767-77 (1991). These procedures for synthesis ofpolymer arrays are now referred to as VLSIPS™ procedures. Using theVLSIPS™ approach, one heterogeneous array of polymers is converted,through simultaneous coupling at a number of reaction sites, into adifferent heterogeneous array. See, U.S. application Ser. Nos.07/796,243 and 07/980,523.

The development of VLSIPS™ technology as described in the above-notedU.S. Pat. No. 5,143,854 and PCT patent publication Nos. WO 90/15070 and92/10092, is considered pioneering technology in the fields ofcombinatorial synthesis and screening of combinatorial libraries. Morerecently, patent application Ser. No. 08/082,937, filed Jun. 25, 1993,describes methods for making arrays of oligonucleotide probes that canbe used to check or determine a partial or complete sequence of a targetnucleic acid and to detect the presence of a nucleic acid containing aspecific oligonucleotide sequence.

In brief, the light-directed combinatorial synthesis of oligonucleotidearrays on a glass surface proceeds using automated phosphoramiditechemistry and chip masking techniques. In one specific implementation, aglass surface is derivatized with a silane reagent containing afunctional group, e.g., a hydroxyl or amine group blocked by aphotolabile protecting group. Photolysis through a photolithogaphic maskis used selectively to expose functional groups which are then ready toreact with incoming 5′-photoprotected nucleoside phosphoramidites. Thephosphoramidites react only with those sites which are illuminated (andthus exposed by removal of the photolabile blocking group). Thus, thephosphoramidites only add to those areas selectively exposed from thepreceding step. These steps are repeated until the desired array ofsequences have been synthesized on the solid surface. Combinatorialsynthesis of different oligonucleotide analogues at different locationson the array is determined by the pattern of illumination duringsynthesis and the order of addition of coupling reagents.

In the event that an oligonucleotide analogue with a polyamide backboneis used in the VLSIPS™ procedure, it is generally inappropriate to usephosphoramidite chemistry to perform the synthetic steps, since themonomers do not attach to one another via a phosphate linkage. Instead,peptide synthetic methods are substituted. See, e.g., Pirrung et al.U.S. Pat. No. 5,143,854.

Peptide nucleic acids are commercially available from, e.g., Biosearch,Inc. (Bedford, Mass.) which comprise a polyamide backbone and the basesfound in naturally occurring nucleosides. Peptide nucleic acids arecapable of binding to nucleic acids with high specificity, and areconsidered “oligonucleotide analogues” for purposes of this disclosure.

In addition to the foregoing, additional methods which can be used togenerate an array of oligonucleotides on a single substrate aredescribed in co-pending applications Ser. No. 07/980,523, filed Nov. 20,1992, and 07/796,243, filed Nov. 22, 1991 and in PCT Publication No. WO93/09668. In the methods disclosed in these applications, reagents aredelivered to the substrate by either (1) flowing within a channeldefined on predefined regions or (2) “spotting” on predefined regions or(3) through the use of photoresist. However, other approaches, as wellas combinations of spotting and flowing, may be employed. In eachinstance, certain activated regions of the substrate are mechanicallyseparated from other regions when the monomer solutions are delivered tothe various reaction sites.

A typical “flow channel” method applied to the compounds and librariesof the present invention can generally be described as follows. Diversepolymer sequences are synthesized at selected regions of a substrate orsolid support by forming flow channels on a surface of the substratethrough which appropriate reagents flow or in which appropriate reagentsare placed. For example, assume a monomer “A” is to be bound to thesubstrate in a first group of selected regions. If necessary, all orpart of the surface of the substrate in all or a part of the selectedregions is activated for binding by, for example, flowing appropriatereagents through all or some of the channels, or by washing the entiresubstrate with appropriate reagents. After placement of a channel blockon the surface of the substrate, a reagent having the monomer A flowsthrough or is placed in all or some of the channel(s). The channelsprovide fluid contact to the first selected regions, thereby binding themonomer A on the substrate directly or indirectly (via a spacer) in thefirst selected regions.

Thereafter, a monomer B is coupled to second selected regions, some ofwhich may be included among the first selected regions. The secondselected regions will be in fluid contact with a second flow channel(s)through translation, rotation, or replacement of the channel block onthe surface of the substrate; through opening or closing a selectedvalve; or through deposition of a layer of chemical or photoresist. Ifnecessary, a step is performed for activating at least the secondregions. Thereafter, the monomer B is flowed through or placed in thesecond flow channel(s), binding monomer B at the second selectedlocations. In this particular example, the resulting sequences bound tothe substrate at this stage of processing will be, for example, A, B,and AB. The process is repeated to form a vast array of sequences ofdesired length at known locations on the substrate.

After the substrate is activated, monomer A can be flowed through someof the channels, monomer B can be flowed through other channels, amonomer C can be flowed through still other channels, etc. In thismanner, many or all of the reaction regions are reacted with a monomerbefore the channel block must be moved or the substrate must be washedand/or reactivated. By making use of many or all of the availablereaction regions simultaneously, the number of washing and activationsteps can be minimized.

One of skill in the art will recognize that there are alternativemethods of forming channels or otherwise protecting a portion of thesurface of the substrate. For example, according to some embodiments, aprotective coating such as a hydrophilic or hydrophobic coating(depending upon the nature of the solvent) is utilized over portions ofthe substrate to be protected, sometimes in combination with materialsthat facilitate wetting by the reactant solution in other regions. Inthis manner, the flowing solutions are further prevented from passingoutside of their designated flow paths.

High density nucleic acid arrays can be fabricated by depositingpresynthesized or natural nucleic acids in predetermined positions.Synthesized or natural nucleic acids are deposited on specific locationsof a substrate by light directed targeting and oligonucleotide directedtargeting. Nucleic acids can also be directed to specific locations inmuch the same manner as the flow channel methods. For example, a nucleicacid A can be delivered to and coupled with a first group of reactionregions which have been appropriately activated. Thereafter, a nucleicacid B can be delivered to and reacted with a second group of activatedreaction regions. Nucleic acids are deposited in selected regions.Another embodiment uses a dispenser that moves from region to region todeposit nucleic acids in specific spots. Typical dispensers include amicropipette or capillary pin to deliver nucleic acid to the substrateand a robotic system to control the position of the micropipette withrespect to the substrate. In other embodiments, the dispenser includes aseries of tubes, a manifold, an array of pipettes or capillary pins, orthe like so that various reagents can be delivered to the reactionregions simultaneously.

3. Hybridization

Nucleic acid hybridization simply involves contacting a probe and targetnucleic acid under conditions where the probe and its complementarytarget can form stable hybrid duplexes through complementary basepairing. The nucleic acids that do not form hybrid duplexes are thenwashed away leaving the hybridized nucleic acids to be detected,typically through detection of an attached detectable label. It isgenerally recognized that nucleic acids are denatured by increasing thetemperature or decreasing the salt concentration of the buffercontaining the nucleic acids. Under low stringency conditions (e.g., lowtemperature and/or high salt) hybrid duplexes (e.g., DNA:DNA, RNA:RNA,or RNA:DNA) will form even where the annealed sequences are notperfectly complementary. Thus specificity of hybridization is reduced atlower stringency. Conversely, at higher stringency (e.g., highertemperature or lower salt) successful hybridization requires fewermismatches.

One of skill in the art will appreciate that hybridization conditionsmay be selected to provide any degree of stringency. In a preferredembodiment, hybridization is performed at low stringency in this case in6× SSPE-T at 37 C (0.005% Triton X-100) to ensure hybridization and thensubsequent washes are perfomned at higher stringency (e.g., 1×SSPE-T at37 C) to eliminate mismatched hybrid duplexes. Successive washes may beperformed at increasingly higher stringency (e.g., down to as low as0.25×SSPE-T at 37 C to 50 C) until a desired level of hybridizationspecificity is obtained. Stringency can also be increased by addition ofagents such as formamide. Hybridization specificity may be evaluated bycomparison of hybridization to the test probes with hybridization to thevarious controls that can be present (e.g., expression level control,normalization control, mismatch controls, etc.).

In general, there is a tradeoff between hybridization specificity(stringency) and signal intensity. Thus, in a preferred embodiment, thewash is performed at the highest stringency that produces consistentresults and that provides a signal intensity greater than approximately10% of the background intensity. Thus, in a preferred embodiment, thehybridized array may be washed at successively higher stringencysolutions and read between each wash. Analysis of the data sets thusproduced will reveal a wash stringency above which the hybridizationpattern is not appreciably altered and which provides adequate signalfor the particular oligonucleotide probes of interest.

In a preferred embodiment, background signal is reduced by the use of adetergent (e.g., C-TAB) or a blocking reagent (e.g., sperm DNA, cot-1DNA, etc.) during the hybridization to reduce non-specific binding. In aparticularly preferred embodiment, the hybridization is performed in thepresence of about 0.5 mg/ml DNA (e.g., herring sperm DNA). The use ofblocking agents in hybridization is well known to those of skill in theart (see, e.g., Chapter 8 in P. Tijssen, supra.)

The stability of duplexes formed between RNAs or DNAs are generally inthe order of RNA:RNA>RNA:DNA>DNA:DNA, in solution. Long probes havebetter duplex stability with a target, but poorer mismatchdiscrimination than shorter probes (mismatch discrimination refers tothe measured hybridization signal ratio between a perfect match probeand a single base mismatch probe). Shorter probes (e.g., 8-mers)discriminate mismatches very well, but the overall duplex stability islow.

Altering the thermal stability (T_(m)) of the duplex formed between thetarget and the probe using, e.g., known oligonucleotide analogues allowsfor optimization of duplex stability and mismatch discrimination. Oneuseful aspect of altering the T_(m) arises from the fact thatadenine-thymine (A-T) duplexes have a lower T_(m) than guanine-cytosine(G-C) duplexes, due in part to the fact that the A-T duplexes have 2hydrogen bonds per base-pair, while the G-C duplexes have 3 hydrogenbonds per base pair. In heterogeneous oligonucleotide arrays in whichthere is a non-uniform distribution of bases, it is not generallypossible to optimize hybridization for each oligonucleotide probesimultaneously. Thus, in some embodiments, it is desirable toselectively destabilize G-C duplexes and/or to increase the stability ofA-T duplexes. This can be accomplished, e.g., by substituting guanineresidues in the probes of an array which form G-C duplexes withhypoxanthine, or by substituting adenine residues in probes which formA-T duplexes with 2,6 diaminopurine or by using the salt tetramethylammonium chloride (TMACl) in place of NaCl.

Altered duplex stability conferred by using oligonucleotide analogueprobes can be ascertained by following, e.g., fluorescence signalintensity of oligonucleotide analogue arrays hybridized with a targetoligonucleotide over time. The data allow optimization of specifichybridization conditions at, e.g., room temperature (for simplifieddiagnostic applications in the future).

Another way of verifying altered duplex stability is by following thesignal intensity generated upon hybridization with time. Previousexperiments using DNA targets and DNA chips have shown that signalintensity increases with time, and that the more stable duplexesgenerate higher signal intensities faster than less stable duplexes. Thesignals reach a plateau or “saturate” after a certain amount of time dueto all of the binding sites becoming occupied. These data allow foroptimization of hybridization, and determination of the best conditionsat a specified temperature.

Methods of optimizing hybridization conditions are well known to thoseof skill in the art (see, e.g., Laboratory Techniques in Biochemistryand Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes,P. Tijssen, ed. Elsevier, N.Y., (1993)).

(C) Signal Detection

In a preferred embodiment, the hybridized nucleic acids are detected bydetecting one or more labels attached to the sample nucleic acids. Thelabels may be incorporated by any of a number of means well known tothose of skill in the art. However, in a preferred embodiment, the labelis simultaneously incorporated during the amplification step in thepreparation of the sample nucleic acids. Thus, for example, polymerasechain reaction (PCR) with labeled primers or labeled nucleotides willprovide a labeled amplification product. In a preferred embodiment,transcription amplification, as described above, using a labelednucleotide (e.g. fluorescein-labeled UTP and/or CTP) incorporates alabel into the transcribed nucleic acids.

Alternatively, a label may be added directly to the original nucleicacid sample (e.g., mRNA, polyA mRNA, cDNA, etc.) or to the amplificationproduct after the amplification is completed. Means of attaching labelsto nucleic acids are well known to those of skill in the art andinclude, for example nick translation or end-labeling (e.g. with alabeled RNA) by kinasing of the nucleic acid and subsequent attachment(ligation) of a nucleic acid linker joining the sample nucleic acid to alabel (e.g., a fluorophore).

Detectable labels suitable for use in the present invention include anycomposition detectable by spectroscopic, photochemical, biochemical,immunochemical, electrical, optical or chemical means. Useful labels inthe present invention include biotin for staining with labeledstreptavidin conjugate, magnetic beads (e.g., Dynabeads™), fluorescentdyes (e.g., fluorescein, texas red, rhodamine, green fluorescentprotein, and the like), radiolabels (e.g., ³H, ¹²5I, ³⁵S, ¹4C, or ³2P),enzymes (e.g., horse radish peroxidase, alkaline phosphatase and otherscommonly used in an ELISA), and colorimetric labels such as colloidalgold or colored glass or plastic (e.g., polystyrene, polypropylene,latex, etc.) beads. Patents teaching the use of such labels include U.S.Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,277,437;4,275,149; and 4,366,241.

Means of detecting such labels are well known to those of skill in theart. Thus, for example, radiolabels may be detected using photographicfilm or scintillation counters, fluorescent markers may be detectedusing a photodetector to detect emitted light. Enzymatic labels aretypically detected by providing the enzyme with a substrate anddetecting the reaction product produced by the action of the enzyme onthe substrate, and colorimetric labels are detected by simplyvisualizing the colored label. One particular preferred methods usescolloidal gold label that can be detected by measuring scattered light.

The label may be added to the target (sample) nucleic acid(s) prior to,or after the hybridization. So called “direct labels” are detectablelabels that are directly attached to or incorporated into the target(sample) nucleic acid prior to hybridization. In contrast, so called“indirect labels” are joined to the hybrid duplex after hybridization.Often, the indirect label is attached to a binding moiety that has beenattached to the target nucleic acid prior to the hybridization. Thus,for example, the target nucleic acid may be biotinylated before thehybridization. After hybridization, an avidin-conjugated fluorophorewill bind the biotin bearing hybrid duplexes providing a label that iseasily detected. For a detailed review of methods of labeling nucleicacids and detecting labeled hybridized nucleic acids see LaboratoryTechniques in Biochemistry and Molecular Biology, Vol. 24: HybridizationWith Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., (1993)).

Fluorescent labels are preferred and easily added during an in vitrotranscription reaction. In a preferred embodiment, fluorescein labeledUTP and CTP are incorporated into the RNA produced in an in vitrotranscription reaction as described above.

Means of detecting labeled target (sample) nucleic acids hybridized tothe probes of the high density array are known to those of skill in theart. Thus, for example, where a calorimetric label is used, simplevisualization of the label is sufficient. Where a radioactive labeledprobe is used, detection of the radiation (e.g. with photographic filmor a solid state detector) is sufficient.

In a preferred embodiment, however, the target nucleic acids are labeledwith a fluorescent label and the localization of the label on the probearray is accomplished with fluorescent microscopy. The hybridized arrayis excited with a light source at the excitation wavelength of theparticular fluorescent label and the resulting fluorescence at theemission wavelength is detected. In a particularly preferred embodiment,the excitation light source is a laser appropriate for the excitation ofthe fluorescent label.

The confocal microscope may be automated with a computer-controlledstage to automatically scan the entire high density array. Similarly,the microscope may be equipped with a phototransducer (e.g., aphotomultiplier, a solid state array, a CCD camera, etc.) attached to anautomated data acquisition system to automatically record thefluorescence signal produced by hybridization to each oligonucleotideprobe on the array. Such automated systems are described at length inU.S. Pat. No: 5,143,854, PCT Application 20 92/10092, and copending U.S.application Ser. No. 08/195,889 filed on Feb. 10, 1994. Use of laserillumination in conjunction with automated confocal microscopy forsignal detection permits detection at a resolution of better than about100 μm, more preferably better than about 50 μm, and most preferablybetter than about 25 μm.

One of skill in the art will appreciate that methods for evaluating thehybridization results vary with the nature of the specific probe nucleicacids used as well as the controls provided. In the simplest embodiment,simple quantification of the fluorescence intensity for each probe isdetermined. This is accomplished simply by measuring probe signalstrength at each location (representing a different probe) on the highdensity array (e.g., where the label is a fluorescent label, detectionof the amount of florescence (intensity) produced by a fixed excitationillumination at each location on the array). Comparison of the absoluteintensities of an array hybridized to nucleic acids from a “test” samplewith intensities produced by a “control” sample provides a measure ofthe relative expression of the nucleic acids that hybridize to each ofthe probes.

One of skill in the art, however, will appreciate that hybridizationsignals will vary in strength with efficiency of hybridization, theamount of label on the sample nucleic acid and the amount of theparticular nucleic acid in the sample. Typically nucleic acids presentat very low levels (e.g., <1 pM) will show a very-weak signal. At somelow level of concentration, the signal becomes virtuallyindistinguishable from background. In evaluating the hybridization data,a threshold intensity value may be selected below which a signal is notcounted as being essentially indistinguishable from background.

Where it is desirable to detect nucleic acids expressed at lower levels,a lower threshold is chosen. Conversely, where only high expressionlevels are to be evaluated a higher threshold level is selected. In apreferred embodiment, a suitable threshold is about 10% above that ofthe average background signal.

In addition, the provision of appropriate controls permits a moredetailed analysis that controls for variations in hybridizationconditions, cell health, non-specific binding and the like. Thus, forexample, in a preferred embodiment, the hybridization array is providedwith normalization controls. These normalization controls are probescomplementary to control sequences added in a known concentration to thesample. Where the overall hybridization conditions are poor, thenormalization controls will show a smaller signal reflecting reducedhybridization. Conversely, where hybridization conditions are good, thenormalization controls will provide a higher signal reflecting theimproved hybridization. Normalization of the signal derived from otherprobes in the array to the normalization controls thus provides acontrol for variations in hybridization conditions. Typically,normalization is accomplished by dividing the measured signal from theother probes in the array by the average signal produced by thenormalization controls. Normalization may also include correction forvariations due to sample preparation and amplification. Suchnormalization may be accomplished by dividing the measured signal by theaverage signal from the sample preparation/amplification control probes(e.g., the Bio B probes). The resulting values may be multiplied by aconstant value to scale the results.

As indicated above, the high density array can include mismatchcontrols. In a preferred embodiment, there is a mismatch control havinga central mismatch for every probe (except the normalization controls)in the array. It is expected that after washing in stringent conditions,where a perfect match would be expected to hybridize to the probe, butnot to the mismatch, the signal from the mismatch controls should onlyreflect non-specific binding or the presence in the sample of a nucleicacid that hybridizes with the mismatch. Where both the probe in questionand its corresponding mismatch control both show high signals, or themismatch shows a higher signal than its corresponding test probe, thereis a problem with the hybridization and the signal from those probes isignored. The difference in hybridization signal intensity between thetarget specific probe and its corresponding mismatch control is ameasure of the discrimination of the target-specific probe. Thus, in apreferred embodiment, the signal of the mismatch probe is subtractedfrom the signal from its corresponding test probe to provide a measureof the signal due to specific binding of the test probe.

The concentration of a particular sequence can then be determined bymeasuring the signal intensity of each of the probes that bindspecifically to that gene and normalizing to the normalization controls.Where the signal from the probes is greater than the mismatch, themismatch is subtracted. Where the mismatch intensity is equal to orgreater than its corresponding test probe, the signal is ignored. Theexpression level of a particular gene can then be scored by the numberof positive signals (either absolute or above a threshold value), theintensity of the positive signals (either absolute or above a selectedthreshold value), or a combination of both metrics (e.g., a weightedaverage).

In some preferred embodiments, a computer system is used to compare thehybridization intensities of the perfect match and mismatch probes ofeach pair. If the gene is expressed, the hybridization intensity (oraffinity) of a perfect match probe of a pair should be recognizablyhigher than the corresponding mismatch probe. Generally, if thehybridizations intensities of a pair of probes are substantially thesame, it may indicate the gene is not expressed. However, thedetermination is not based on a single pair of probes, the determinationof whether a gene is expressed is based on an analysis of many pairs ofprobes.

After the system compares the hybridization intensity of the perfectmatch and mismatch probes, the system indicates expression of the gene.As an example, the system may indicate to a user that the gene is eitherpresent (expressed), marginal or absent (unexpressed). Specificprocedures for data analysis is disclosed in U.S. application Ser. No.08/772,376, previously incorporated for all purposes.

In addition to high density nucleic acid arrays, other methods are alsouseful for massive gene expression monitoring. Differential display,described by Liang, P. and Pardee, A. B. (Differential Display ofeukaryotic messenger RNA by means of the polymerase chain reaction.Science 257:967-971, 1992, incorporated herein by reference for allpurposes) provides a useful mean for distinguishing gene expressionbetween two samples. Serial analysis of gene expression, described byVelculescu et al. (Serial Analysis of Gene Expression. Science,270:484-487, 1995, incorporated herein by reference for all purposes)provides another method for quantitative and qualitative analysis ofgene expression. Optical fiber oligonucleotide sensors, described byFerguson et al. (A Fiber-optic DNA biosensor microarray for the analysisof gene expression. Nature—Biotechnology 14:1681-1684, 1996), can alsobe used for gene expression monitoring.

It is understood that the examples and embodiments described herein arefor illustrative purposes only and that various modifications or changesin light thereof will be suggested to persons skilled in the art and areto be included within the spirit and purview of this application andscope of the appended claims. All publications, patents, and patentapplications cited herein are hereby incorporated by reference for allpurposes.

EXAMPLES

Summary

Mechanistic insights to viral replication and pathogenesis generallyhave come from the analysis of viral gene products, either by studyingtheir biochemical activities and interactions individually or bycreating mutant viruses and analyzing their phenotype. Now it ispossible to identify and catalog the host cell genes whose mRNA levelschange in response to the pathogen. We have employed DNA arraytechnology to monitor the level of approximately 6600 human mRNAs inuninfected as compared to human cytomegalovirus-infected cells. Thelevel of 258 mRNAs changed by a factor of 4 or more early afterinfection, before the onset of viral DNA replication. Several of thesemRNAs encode gene products that might play key roles in virus-inducedpathogenesis, identifying them as intriguing targets for further study.

Introduction

Human cytomegalovirus (HCMV) has the potential to alter cellular geneexpression though multiple mechanisms. Its initial interaction with thecell surface could initiate a regulatory signal; indeed, the virion gBand gH glycoproteins induce cellular transcription factors when added touninfected cells (1). Constituents of the virion, such as the tegumentprotein, pp71, migrate to the nucleus and activate transcription afterinfection (2), and viral proteins synthesized after infection, such asthe immediate early 1 and 2 proteins, modulate transcription (3-5). Thevirus encodes several G protein-coupled receptors (6,7) that likelyinitiate gene regulatory signal cascades in response to ligands, andHCMV infection has been shown to perturb cell cycle regulation (8-11),which leads to changes in cellular gene expression. The complexvirus-host cell interaction has the potential to dramatically modulatethe expression of cellular genes.

Relatively few cellular genes have been identified whose activitychanges in HCMV-infected cells (12). Recently, we used differentialdisplay analysis to identify 15 interferon-inducible genes that areactivated by the virus subsequent to infection (13). However, thisscreen identified only genes whose mRNA levels changed dramatically, andwe were not able to perform the screen under a variety of conditionsgiven its labor-intensive nature. In contrast to differential display,the DNA array assay is easily performed and can detect subtle changes inmRNA levels. We report the identification of 258 cellular mRNAs whoselevel changes by a factor of 4 or more before the onset of HCMV DNAreplication

Materials and Methods

Cells and Viruses.

Primary human foreskin fibroblasts at passage 10-15 were cultured inDMEM containing 10% fetal calf serum. After the cells remained atconfluence for thee days, they were infected at a multiplicity of 3plaque-forming units per cell with HCMV AD169 or Toledo virions thatwere purified as described (14).

Sample Preparation and Analysis with DNA Arrays.

Biotinylated single-stranded antisense RNA samples for hybridizationwere prepared as described (15) with minor modifications. Total cellularRNA was prepared using the TRIZOL Reagent (GibcoBRL), polyadenylated RNAwas isolated, and portions (5 μg) were used as the template for thefirst strand cDNA synthesis in a reaction that was primed with oligo(dT) containing a T7 RNA polymerase promoter sequence at its 5′ end[5′-GGCCAGTGAATTGTAATACGACTCACTATAGGGAGGCGG(T)₂₄-3′]. The second cDNAstrand was synthesized using E. coli DNA polymerase I and ligase. Theresulting cDNA (0.5-1 μg) was used as template to make a biotinylatedRNA probe by in vitro transcription using the T7 Megascript System(Ambion). Unincorporated nucleotides were removed using a G-50 QuickSpin Column (Boehringer Mannheim). The labeled RNA was fragmented to anaverage size of 50-100 bases by incubating at 94° C. for 30 min inbuffer containing 40 mM Tris-Ac, pH8.1, 100 mM KOAc, and 30 mM MgOAc.The hybridization (15 h), washing and staining protocols were asdescribed (15), and employed a set of four human gene chips (HUM6000 A,B, C and D, Affymetrix). The DNA arrays were scanned using a confocalscanner manufactured for Affymetrix by Molecular Dynamics.

Data analysis:

The data collected in each hybridization experiment was processed usingthe GeneChip™ software supplied with the Affymetrix instrumentationsystem. To evaluate if RNA corresponding to each of the 6600 genesencoded on the array was detectable or undetectable a number ofparameters were evaluated (15, 16), including the number of probe pairsinterrogating each gene in which the intensity of the perfect match (PM)hybridization reaction exceeded that of the mismatch (MM) hybridizationsignal (cutoffs <45%) and the PM/MM ratios for each set of probe pairs.To determine the quantitative amounts of RNA from each gene, the averageof the differences (PM-MM) for each probe pair in a probe set wascalculated (cutoff >50) as well as the average differences across theprobe sets (cutoff >10). The cutoff thresholds were determinedempirically to be conservative, i.e., they minimized false positives.The change in the level of expression for any gene was consideredsignificant if the change in the average differences across the probesets was greater than 3-fold.

RNA Analysis by Northern Blot.

GeneChip™ results were confirmed by Northern blot assay. Total RNA (3μg) from mock-infected cells or cells infected with the HCMV AD169 orToledo strains was subjected to electrophoresis, blotted to a membraneand probed with random hexanucleotide-primed ³²P-labeled cDNA fragmentsfrom I.M.A.G.E. clones (Genome Systems, Inc.).

Results

The gene chip assay utilized a set of four probe arrays that togetherinclude oligonucleotides corresponding to more than 6600 human mRNAs(16). Each array (1.6 cm²) contains more than 65,000 features, and adifferent oligodeoxyribonucleotide (25 bases) is synthesized on thesurface of the derivatized glass wafer within the boundaries of eachfeature using light sensitive chemistry (17-21). The arrays contain 20pairs of oligonucleotide probes corresponding to each RNA that isinterrogated. Each probe pair consists of one 25-mer that is a perfectcomplement to the RNA (a perfect match probe) and a companionoligonucleotide that carries a single base difference in a centralposition (a mismatch probe). The mismatch probes serve as internalcontrols for hybridization specificity. Empirically derived rules usedfor the selection of oligonucleotide probes with the best sensitivityand specificity have been described (16).

RNA samples were prepared for analysis at 40 min, 8 h and 24 h aftermock-infection or HCMV (strain AD169) infection of primary humanfibroblasts. Under these conditions, HCMV DNA replication begins between24-36 h after infection (10), and the complete viral replication cyclerequires about 72 h. So all of the time points assayed were relativelyearly in the HCMV replication cycle. Biotinylated RNA target sampleswere generated by in vitro transcription of cDNA that was prepared fromcellular mRNA using an oligo dT primer with a T7 polymerase promoter atits 5′ end. This protocol amplifies the mRNA population in an unbiasedand reproducible fashion (16). The resulting antisense RNA wasfragmented to an average size of 50 to 100 bases, hybridized to theoligonucleotide probe arrays and then the arrays were reacted withphycoerythin-conjugated strepavidin. The intensity of the fluorescentsignal within each feature was then quantified using a confocal scanner(Affymetrix). Previous studies have demonstrated that the fluorescentsignal is linearly related to the concentration of RNA target within therange of about 1 (1 part in 300,000) to 10³ copies of RNA per cell (16).Above 1 copies per cell, the signal continues to increase, but in anonlinear fashion because the oligonucleotide probes begin to saturate.RNAs corresponding to 3020-3380 out of the 6600 genes were detected indifferent experiments. The range is due in part to virus-inducedchanges. However, much of the variation is due to mRNAs expressed at the1-10 copy per cell level, scoring as present in one assay and absent inanother experiment.

The DNA arrays contain a set of 198 oligonucleotides corresponding tosequences spread across the entire length of the GAPDH mRNA. The targetRNAs prepared at 8 h after infection with HCMV (FIG. 1A) or after mockinfection (data not shown) hybridized to the complete GAPDH probe set.The arrays also included oligonucleotides spanning the actin mRNA, andtarget RNAs hybridized to this complete probe set, as well (data notshown). These controls demonstrated that the target RNA preparationsspan the entire length of the test gene and provided confidence that thecDNA synthesis and subsequent in vitro transcription generated targetRNAs representative of the input mRNA.

The reproducibility of hybridization signals produced by independentpreparations of target RNAs also was tested. Biotinylated target RNA wasprepared from mock-infected cells (FIG. 1B) or at 8 h after infection(FIG. 1C) and hybridized to different sets of arrays. The concentrationof only one cellular mRNA differed by more than a factor of 3 in thereplicate experiments (FIG. 1B). This control demonstrates that thehybridization signals observed in independent experiments are highlyreproducible. Further, the two preparations of infected cell target RNAswere prepared from infected primary fibroblasts derived from twodifferent tissue samples, ruling out the possibility that changes in RNAlevels might reflect genetic differences in the host cells. Differencesgreater than 3-fold observed for hybridization signals in comparisons ofmock-infected versus infected cells should identify genes whose mRNAlevels change after infection.

When target RNA preparations were compared at 40 min after mock or virusinfection, the level of 27 mRNAs had changed in response to infection bya factor of thee or more; at 8 and 24 h after infection, the number ofaltered mRNAs increased to 93 and 364, respectively (FIG. 2). Applying amore stringent four-fold cut off, we generated a set of 258 mRNAs forfurther analysis (Table 1). Of these mRNAs, 124 increased and 134decreased after infection. We assume that most changes result fromaltered transcriptional regulation, but we have not yet tested thissupposition. We confirmed 49 (40%) of the mRNAs predicted to beincreased and 23 (17%) of the mRNAs predicted to be decreased either bynorthern blot analysis of independent RNA preparations (representativeresults in FIG. 3) or by reference to earlier studies that demonstrateda change. All attempts to confirm a predicted alteration in the group of258 mRNAs were successful.

We assayed changes in mRNA levels for a total of 58 genes in this studyby northern blot. When we performed these assays, we included RNApreparations from cells infected with HCMV strain AD169, the laboratoryadapted strain used for the DNA array analysis, and HCMV strain Toledo,a clinical isolate that has not been extensively passaged in culturedcells (22). We observed the same alteration in mRNA level for bothinfections (representative results in FIG. 3). Although we might findsome differences as more genes are assayed, our results to date arguethat the laboratory and clinical isolates of HCMV alter cellular geneexpression in a similar fashion

Discussion

HCMV replicates in many different cell types within its infected host,some of which might respond to infection differently than the primaryfibroblasts we have studied here. Keeping this caveat in mind, wenevertheless can speculate that several of the cellular genes whose mRNAlevels change after infection of fibroblasts might profoundly influenceHCMV replication and pathogenesis.

HLA-E mRNAs.

In order to protect infected cells from cytotoxic T lymphocytes,multiple HCMV gene products act to reduce cell surface expression ofclassical class I MHC molecules (23-28). Although these viral activitiesprotect infected cells from cytotoxic T lymphocytes, they also have thepotential to render infected cells susceptible to natural killer (NK)cells that can recognize and destroy cells that no longer express classI MHC molecules. HLA-E mRNA is induced by a factor of 19 at 24 h afterinfection (Table 1), whereas HLA-A, HLA-D and HLA-G family members thatwere represented in the DNA arrays were not changed (data not shown).HLA-E is a nonclassical class I molecule whose cell surface expressionrequires that it bind peptides derived from the signal sequences ofother class I molecules (HLA-A, -B and -C) (29). Recently, it has beenshown that natural killer (NK) cells recognize and spare target cellsexpressing HLA-E on their surface (30, 31). This recognition is mediatedby the NK cell CD94-NKG2 cell surface receptor. Assuming that theelevated mRNA leads to elevated cell surface expression of HLA-E, thismodulation should protect virus-infected cells from NK cell killing.This would be the second mechanism by which HCMV avoids NK cellsurveillance. The viral UL18 protein is an MHC homologue that engagesanother receptor (NKIR) on the NK cell to avoid attack (32).

Ro/SSA 52 kDa mRNA.

HCMV-infected cells contain enhanced levels of the Ro/SSA 52 kDa protein(Table 1). This protein, which is a constituent of a ribonucleoproteincomplex, is induced by a factor of 12 at 24 h after infection.Autoantibodies to this protein are found in a variety of connectivetissue diseases: commonly in systemic lupus erythematosis, neonatallupus erythematosis, and Sjogren's syndrome, and less frequently inrheumatoid arthitis (33). There is good evidence that theseautoantibodies play a direct pathogenic role in neonatal lupuserythematosis and subacute cutaneous lupus erythematosis (33, 34).However, the mechanism by which the immune system initially responds toRo/SSA and other intracellular self-antigens is not clear. One popularhypothesis suggests that molecular mimicry is an important initiatingmechanism, i.e., aspects of the immune response to a microbe cross reactwith self-proteins (35). Conceivably, overexpression of a commonlytargeted autoantigen, such as the Ro/SSA antigen in HCMV-infected cells,also could favor an autoimmune response. Although the Ro/SSA 52 kDaantigen is normally found in the nucleus and cytoplasm, it can bedetected on the surface of peripheral lymphocytes that have beenstressed by heat shock or treatment with ultraviolet light (36). Perhapsstress induced by HCMV infection also leads to cell surface presentationof Ro/SSA, facilitating an autoimmune response to the overexpressedantigen. Murine cytomegalovirus has been shown to induce autoimmuneantibodies in infected mice (37-40), although Ro/SSA antibodies were notmonitored in these studies.

Lipocortin 1, cPLA2 and COX-2 mRNAs.

Multiple constituents of the pathway that produces prostaglandin E2 fromarachidonic acid are modulated by HCMV (Table 1). Cytosolicphospholipase A2 (cPLA2) mRNA increases by a factor of 12 andcyclooxygenase-2 (COX-2) mRNA is elevated by a factor of 7 at 24 h afterinfection. Lipocortin 1, also known as annexin I, mRNA decreases by afactor of 9 at 24 h after infection. When cPLA2 is activated byphosphorylation, it translocates to membranes where it selectivelycleaves and releases arachidonic acid; then COX2 converts it toprostaglandin E2. Lipocortin 1 inhibits the activation of cPLA2 (41).Thus, in HCMV-infected fibroblasts, the synthesis of prostaglandin E2 isactivated by the induction of cPLA2 and COX2 and the inhibition of thenegative regulator lipocortin 1, assuming that the changes in mRNAlevels translate to changes in active proteins. Further, HCMV infectionhas been shown to activate latent cPLA2 by inducing its phosphorylation(42). Thus, this pathway is strongly induced at both the transcriptionaland posttranslational levels after infection, and this should lead to amarked increase in the production of prostaglandin E2. Prostaglandinsserve as second messengers to stimulate a variety of responses,including inflammation. Perhaps the activation of this pathway is acellular reaction to HCMV infection designed to induce a cell-mediatedresponse that will kill the infected cell and thereby inhibit spread ofthe infection. Alternatively, one might speculate that the virus eitherinduces the pathway or fails to antagonize the induction as a strategyto facilitate spread of the virus within the infected host. Inflammationmight serve to lure monocytes and monocytic precursors to the vicinityof the infected cells where they can be infected. Cells of the monocyticlineage harbor HCMV on a long-term basis in a latent state (43-45).

It is possible that the concerted changes in cPLA2, COX-2 and lipocortinI are an indirect effect of HCMV gene action. IL-1β has been shown toregulate this set of genes (46) in the same manner as seen in infectedcells. Although several reports have suggested that IL-1β activity isdecreased in cultures of HCMV-infected monocytes (47, 48), the HCMV IE1gene has been shown to induce the accumulation of IL-1β mRNA intransfected monocytes (49, 50). The IL-1β gene was not included in theoligonucleotide arrays assayed in this report, so we do not know if itsmRNA is induced by infection of fibroblasts.

Thombospondin-1 mRNA.

Thombospondin-1 is a calcium-binding protein released upon plateletactivation (51). It is a constituent of the extracellular matrix thatregulates cell growth and differentiation, and it might potentiate tumorprogression (52). Recently, thombospondirn-1-deficient mice have beenproduced (53) whose lungs exhibit acute and chonic cell infiltrates withincreased fibroblastic and epithelial cell proliferation, matrixdeposition and diffuse alveolar hemorrhage characteristic of pneumonia.HCMV causes a 21-fold reduction in the level of thombospondin-1 mRNA by24 h after infection (Table 1). Replication in the lung that leads topneumonia is one of the principle consequences of active HCMV infectionin immunosuppressed individuals (54). Given the phenotype ofthombospondin 1-deficient mice, one can speculate that the reduction inthis mRNA might contribute to pneumonia induced by acute HCMV infection.

MITF mRNA.

The microphthalmia-associated transcription factor (MITF) is the productof the microphthalmia gene. Mice have been described with a variety ofmutations in this gene (55), and the most severe manifestations of themutations include microphthalmia, oeteopetrosis and deafness. In thehuman, MITF mutations were identified in two families afflicted withWaardenberg syndrome type 2, which causes hearing loss and patchypigmentation of the eyes, hair and skin (56). Infection of humans withHCMV early in pregnancy has been reported to cause anophthalmia (57) andcongenital infection of mice with murine cytomegalovirus can causemicrophthalmia (58). Modulation of MITF mRNA levels by the virus couldcontribute to these abnormalities. MITF mRNA is reduced by a factor of4-8 at 24 h after HCMV infection of fibroblasts. Whereas the associationof HCMV with eye abnormalities appears to be rare, congenital HCMVinfection is a common cause of hearing loss. Conceivably, HCMV-inducedhearing loss is a consequence of an inhibitory effect on MITF mRNAexpression during development. This supposition is consistent with theobservation that MITF mutations are associated with hearing loss in theWaardenberg syndrome. HCMV could potentially modulate MITF in cells thatare eventually killed or in cells where viral gene expression does notlead to cell death.

Conclusion.

The roles of the cellular genes discussed above in HCMV replication andpathogenesis remain highly speculative. Nevertheless, the ability toidentify cellular genes whose functions provide tantalizing hints ofpotential mechanistic roles in infectious disease processes underscoresthe utility of gene array technology in the study of pathogens. Theglobal analysis of changes in mRNA levels provides a catalog of genesthat are modulated as a result of the host-pathogen interaction andtherefore deserve further scrutiny. DNA array analysis provides animportant new approach for the investigation of pathogenic mechanisms.

REFERENCES

1. Yurochko, A. D., Hwang, E. S., Rasmussen, L., Keay, S., Pereira, L. &Huang, E. S. (1997) J. Virol. 71, 5051-5059.

2. Liu, B. & Stinski, M. F. (1992) J. Virol. 66, 4434-4444.

3. Pizzomo, M. C., O'Hare, P., Sha, L., LaFemina, R. L. & Hayward, G. S.(1988) J. Virol. 62, 1167-1179.

4. Malone, C. L., Vesole, D. H. & Stinski, M. F. (1990) J. Virol. 64,1498-1506.

5. Stenberg, R. M., Fortney, J., Barlow, S. W., Magrane, B. P., Nelson,J. A. & Ghazal, P. (1990) J. Virol. 64, 1556-1565.

6. Chee, M. S., Satchwell, S. C., Preddie, E., Weston, K. M. & Barrell,B. G. (1990) Nature 344, 774-777.

7. Welch, A. R., McGregor, L. M. & Gibson, W. (1991) J. Virol. 65,3915-3918.

8. Jault, F. M., Jault, J- M., Ruchti, R., Fortunato, E. A., Clark, C.,Corbeil, J., Richman, D. D. & Spector, D. H. (1995) J. Virol. 69,6697-6704.

9. Bresnahan, W. A., Boldogh, I., Thompson, E. A., & Albrecht, T. (1996)Virology 224, 150-160.

10. Lu, M. & Shenk, T. (1996) J. Virol. 70, 8850-8857.

11. Dittmer, D. & Mocarski, E. S. (1997) J. Virol. 71, 1629-1634.

12. Mocarski, E. S. (1996) in Fields Virology, eds. Fields, B. N.,Knipe, D. M. & Howley, P. M. (Lippencott, Philadelphia), 3^(rd) Ed., pp.2447-2492.

13. Zhu, H., Cong, J- P. & Shenk, T. (1997) Proc. Natl. Acad. Sci. USA94, 13985-13990.

14. Baldick, C. J. & Shenk, T. (1996) J. Virol. 70, 6097-6105.

15. Wodicka, L., Dong, H., Mittmann, M., Ho, M. & Lockhart, D. J. (1997)Nature Biotech. 15, 1359-1367.

16. Lockhart, D. J., Dong, H., Byrne, M. C., Follette, M. T., Gallo, M.V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H. &Brown, E. L. (1996) Nature Biotech. 14, 1675-1680.

17. Fodor, S. P. A., Read, J. L., Pirrung, M. C., Stryer, L., Lu, A. T.& Solas, D. (1991) Science 251, 767-773.

18. Fodor, S. P. A., Rava, R. P., Huang, X. C-, Pease, A. C., Holmes, C.P. & Adams, C. L. (1993) Science 364, 555-556.

19. Pease, A. C., Solas, D., Sullivan, E. J., Cronin, M. T., Holmes, C.P. & Fodor, S. P. A. (1994) Proc. Natl. Acad Sci. USA 91, 5022-5026.

20. Lipshutz, R. J., Morris, D., Chee, M., Hubbell, E., Kozal, N. S.,Shen, N., et al. (1995) BioTechniques 19, 442-447.

21. Chee, M., Yang, R., Hubbell, E., Bemo, A., Huang, X. C., Stern, D.,Winkler, J., Lockhart, D. J., Morris, M. S. & Fodor, S. P. A. (1996)Science 274, 610-614.

22. Quinnan, G. V., Jr., Delery, M., Rook, A. H., Frederick, W. R.,Epstein, J. S., Manischewitz, J. F., Jackson, L., Ramsey, K. M., Mittal,K., Plotkin, S. A., et al. (1984) Ann. Intern. Med. 101, 478-483.

23. Ahn, K., Angulo, A., Ghazal, P., Peterson, P. A., Yang, Y. & Fruh,K. (1996) Proc. Natl Acad. Sci. USA 93, 10990-10995.

24. Jones, T. R., Wiertz, E. J., Sun, L., Fish, K. N., Nelson, J. A. &Ploegh, H. L. (1996) Proc. Natl. Acad. Sci. USA 93, 11327-11333.

25. Wiertz, E. J., Jones, T. R., Sun, L., Bogyo, M., Geuze, H. J. &Ploegh, H. L. (1996) Cell 84, 769-779.

26. Ahn, K., Gruhler, A., Galocha, B., Jones, T. R., Wiertz, E. J.,Ploegh, H. L., Peterson, P. A., Yang, Y. & Fruh, K. (1997) Immunity 6,613-621.

27. Hengel, H., Koopmann, J. O., Floh, T., Muranyi, W., Goulmy, E.,Hammerling, G. J., Koszinowski, U. H. & Momburg, F. (1997) Immunity 6,623-632.

28. Jones, T. R. & Sun, L. (1997) J. Virol. 71, 2970-2979.

29. Braud, V., Jones, E Y. & McMichael, A. (1997) Eur. J Immunol. 27,1164-1169.

30. Borrego, F., Ulbrecht, M., Weiss, E. H., Coligan, J. E. & Brooks, A.G. (1998) J. Exp. Med. 187, 813-818.

31. Braud, V. M., Allan, D. S. J., O'Callaghan, C. A., Soderstrom, K.,D'Andrea, A., Ogg, G.S., Lazetic, S., Yound, N. T., Bell, J. I.,Phillips, J. H., Lanier, L. L. & McMichael, A. J. (1998) Nature 391,795-799.

32. Rayburn, H. T., Mandelboim, O., Vales-Gomez, M., Davis, D.M.,Pazmany, L. & Strominger, J. L. (1997) Nature 386, 514-517.

33. Bouffard, P., Laniel, M- A. & Boire, G. (1996) J. Rheumatol. 23,1838-1841.

34. Finkelstein, Y., Adler, Y., Harel, L., Nussinovitch, M. & Youinou,P. (1997) Ann. Med. Interne (Paris) 148, 205-208.

35. Herrath, M. G. & Oldstone, M. B. A. (1996) Curr. Opin. Immunol. 8,878-885.

36. Igarashi, T., Itoh, Y., Fukunaga, Y. & Yamamoto, M. (1995)Autoimmunity 22, 33-42.

37. O'Donoghue, H. L., Lawson, C. M. & Reed, W. D. (1990) Immunol. 71,20-28.

38. Lawson, C. M., O'Donoghue, H. L., Farrell, H. E., Shellam, G. R.,and Reed, W. D. (1991) Immunol. 72, 426-433.

39. Price, P., Olver, S. D., Gibbons, A. E. & Shellam, G. R. (1993)Immunol. 78,14-21.

40. Chapman, A. J., Farrell, H. E., Thomas, J. A., Papadimitriou, J. M.,Garlepp, M. J., Scalzo, A. A. & Shellam, G. R. (1994) Immunol. 81,435-443.

41. Croxtall, J. D., Choudhury, Q., Newman, S. & Flower, R. J. (1996)Biochem. Pharmacol. 52, 351-356.

42. Shibutani, T., Johnson, T. M., Yu, Z. X., Ferrans, V. J., Moss, J. &Epstein, S. E. (1997) J. Clin. Invest. 100, 2054-2061.

43. Kondo, K., Xu, J. & Mocarski, E. S. (1996) Proc. Natl. Acad. Sci.USA 93, 11137-11142.

44. Sinclair, J. & Sissons, P. (1996) Intervirol. 39, 293-301.

45. Soderberg-Naucler, C., Fish, K. N. & Nelson, J. A. (1997) Cell 91,119-126.

46. Croxtall, J. D., Newman, S. P., Choudhury, Q. & Flower, R. J. (1996)Biochem. Biophys. Res. Commun. 220, 491-495.

47. Rogers, B. C., Scott, D. M., Mundin, J. & Sissons, J. G. P. (1985)J. Virol. 55, 527-532. 49. Iwamoto, G. K., Monick, M. M., Clark, B. D.,Auron, P. E., Stinski, M. F. & Hunninghake, G. W. (1990) J. Clin.Invest. 85, 1853-1857.

48. Kapasi, K. & Rice, G. P. A. (1998) J. Virol. 62, 3603-3607.

49. Iwamoto, G. K., Monick, M. M., Clark, B. D., Auron, P. E., Stinski,M. F. & Hunninghake, G. W. (1990) J. Clin. Invest. 85, 1853-1857.

50. Crump, J. W., Geist, L. J., Auron, P. E., Webb, A. C., Stinski, M.F. & Hunninghake, G. W. (1992) Am. J Respir. Cell. Mol. Biol. 6,674-677.

51. Adams, J. C. (1997) Int. J. Biochem. Cell Biol. 29, 861-865.

52. Tuszynski, G. P. & Nicosia, R. F. (1995) BioEssays 18, 71-76.

53. Lawler, J., Sunday, M., Thibert, V., Duquette, M., George, E. L.,Rayburn, H. & Hynes, R. O. (1998) J. Clin. Invest. 101, 982-992.

54. Britt, W. J. & Alford, C. A. (1996) in Fields Virology, eds. Fields,B. N., Knipe, D. M. & Howley, P. M. (Lippencott, Philadelphia), 3^(rd)Ed., pp. 2493-2523.

55. Steingrimsson, E., Moore, K. J., Lamoreux, M. L., Ferre-D'Amare, A.R., Burley, S. K., Zimring, D. C. S., Skow, L. C., Hodgkinson, C. A.,Arrheiter, H., Copeland, N. G. & Jenkins, N. A. (1994) Nature Gen. 8,256-263.

56. Tassabebji, M., Newton, V. E. & Read, A. P. (1994) Nature Genet. 8,251-255.

57. McCarthy, R. W., Frenkel, L. D., Kollarits, C. R. & Keys, M. P.(1980) Am. J. Ophthalmol. 90, 558-561.

58. Tsutsui, Y., Kashiwai, A., Kawamura, N. & Kadota, C. (1993) Am. JPathol. 143, 804-813.

The present invention provides greatly improved methods, compositions,and apparatus for identifying gene function and for studying theregulatory relationship among genes. It is to be understood that theabove description is intended to be illustrative and not restrictive.Many variations of the invention will be apparent to those of skill inthe art upon reviewing the above description. By way of example, theinvention has been described primarily with reference to the use of ahigh density oligonucleotide array, but it will be readily recognized bythose of skill in the art that other nucleic acid arrays, other methodsof measuring transcript levels and gene expression monitoring at theprotein level could be used. The scope of the invention should,therefore, be determined not with reference to the above description,but should instead be determined with reference to the appended claims,along with the full scope of equivalents to which such claims areentitled.

1-45. (canceled)
 46. A method for identifying a subset of genes whichare improved targets for drug development, comprising the steps of: (a)comparing expression levels of at least two genes in two cell samples,wherein the two cell samples are the same but for differences caused bya selected environmental, genetic, disease, or developmental agent; (b)identifying a set of genes whose expression levels differ between thetwo cell samples; (c) searching a database to identify an unselectedenvironmental agent, gene, disease, or developmental stage previouslyassociated with expression or altered expression of individual membersof the set of genes; (d) identifying a common biological feature betweenthe selected environmental, genetic, disease, or developmentaldifference of step (a) and the unselected environmental agent, gene,disease, or developmental stage identified in step (c), whereinidentification of a common biological feature between two environmental,genetic, disease, or developmental phenomena which both affectexpression of a common gene identifies the common gene as being a memberof a subset of genes which are improved targets for drug development.47. A computer-readable medium having computer-executable instructionsfor performing steps comprising: (a) comparing expression levels of atleast two genes in two cell samples, wherein the two cell samples arethe same but for differences caused by a selected environmental,genetic, disease, or developmental agent; (b) identifying a set of geneswhose expression levels differ between the two cell samples; (c)searching a database to identify an unselected environmental agent,gene, disease, or developmental stage previously associated withexpression or altered expression of individual members of the set ofgenes; (d) identifying a common biological feature between the selectedenvironmental, genetic, disease, or developmental difference of step (a)and the unselected environmental agent, gene, disease, or developmentalstage identified in step (c), wherein identification of a commonbiological feature between two environmental, genetic, disease, ordevelopmental phenomena which both affect expression of a common geneidentifies the common gene as being a member of a subset of genes whichare improved targets for drug development.